Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
J Acoust Soc Am ; 155(4): 2659-2669, 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38634661

RESUMO

Within the realm of voice classification, singers could be sub-categorized by the weight of their repertoire, the so-called "singer's Fach." However, the opposite pole terms "lyric" and "dramatic" singing are not yet well defined by their acoustic and articulatory characteristics. Nine professional singers of different singers' Fach were asked to sing a diatonic scale on the vowel /a/, first in what the singers considered as lyric and second in what they considered as dramatic. Image recording was performed using real time magnetic resonance imaging (MRI) with 25 frames/s, and the audio signal was recorded via an optical microphone system. Analysis was performed with regard to sound pressure level (SPL), vibrato amplitude, and frequency and resonance frequencies as well as articulatory settings of the vocal tract. The analysis revealed three primary differences between dramatic and lyric singing: Dramatic singing was associated with greater SPL and greater vibrato amplitude and frequency as well as lower resonance frequencies. The higher SPL is an indication of voice source changes, and the lower resonance frequencies are probably caused by the lower larynx position. However, all these strategies showed a considerable individual variability. The singers' Fach might contribute to perceptual differences even for the same singer with regard to the respective repertoire.


Assuntos
Música , Canto , Qualidade da Voz , Acústica
2.
J Vis Exp ; (203)2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38251763

RESUMO

This study aims to develop super-soft, non-sticky vocal fold models for voice research. The conventional manufacturing process of silicone-based vocal fold models results in models with undesirable properties, such as stickiness and reproducibility issues. Those vocal fold models are prone to rapid aging, leading to poor comparability across different measurements. In this study, we propose a modification to the manufacturing process by changing the order of layering the silicone material, which leads to the production of non-sticky and highly consistent vocal fold models. We also compare a model produced using this method with a conventionally manufactured vocal fold model that is adversely affected by its sticky surface. We detail the manufacturing process and characterize the properties of the models for potential applications. The outcomes of the study demonstrate the efficacy of the modified fabrication method, highlighting the superior qualities of our non-sticky vocal fold models. The findings contribute to the development of realistic and reliable vocal fold models for research and clinical applications.


Assuntos
Confiabilidade dos Dados , Prega Vocal , Reprodutibilidade dos Testes , Silicones
3.
J Speech Lang Hear Res ; : 1-15, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37971432

RESUMO

PURPOSE: Breathing is ubiquitous in speech production, crucial for structuring speech, and a potential diagnostic indicator for respiratory diseases. However, the acoustic characteristics of speech breathing remain underresearched. This work aims to characterize the spectral properties of human inhalation noises in a large speaker sample and explore their potential similarities with speech sounds. Speech sounds are mostly realized with egressive airflow. To account for this, we investigated the effect of airflow direction (inhalation vs. exhalation) on acoustic properties of certain vocal tract (VT) configurations. METHOD: To characterize human inhalation, we describe spectra of breath noises produced by human speakers from two data sets comprising 34 female and 100 male participants. To investigate the effect of airflow direction, three-dimensional-printed VT models of a male and a female speaker with static VT configurations of four vowels and four fricatives were used. An airstream was directed through these VT configurations in both directions, and their spectral consequences were analyzed. RESULTS: For human inhalations, we found spectra with a decreasing slope and several weak peaks below 3 kHz. These peaks show moderate (female) to strong (male) overlap with resonances found for participants inhaling with a VT configuration of a central vowel. Results for the VT models suggest that airflow direction is crucial for spectral properties of sibilants, /ç/, and /i:/, but not the other sounds we investigated. Inhalation noise is most similar to /ə/ where airflow direction does not play a role. CONCLUSIONS: Inhalation is realized on ingressive airflow, and inhalation noises have specific resonance properties that are most similar to /ə/ but occur without phonation. Airflow direction does not play a role in this specific VT configuration, but subglottal resonances may do. For future work, we suggest investigating the articulation of speech breathing and link it to current work on pause postures. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.24520585.

4.
J Acoust Soc Am ; 153(6): 3281, 2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37307363

RESUMO

This study investigated how the bandwidths of resonances simulated by transmission-line models of the vocal tract compare to bandwidths measured from physical three-dimensional printed vowel resonators. Three types of physical resonators were examined: models with realistic vocal tract shapes based on Magnetic Resonance Imaging (MRI) data, straight axisymmetric tubes with varying cross-sectional areas, and two-tube approximations of the vocal tract with notched lips. All physical models had hard walls and closed glottis so the main loss mechanisms contributing to the bandwidths were sound radiation, viscosity, and heat conduction. These losses were accordingly included in the simulations, in two variants: A coarse approximation of the losses with frequency-independent lumped elements, and a detailed, theoretically more precise loss model. Across the examined frequency range from 0 to 5 kHz, the resonance bandwidths increased systematically from the simulations with the coarse loss model to the simulations with the detailed loss model, to the tube-shaped physical resonators, and to the MRI-based resonators. This indicates that the simulated losses, especially the commonly used approximations, underestimate the real losses in physical resonators. Hence, more realistic acoustic simulations of the vocal tract require improved models for viscous and radiation losses.


Assuntos
Acústica , Glote , Vibração , Viscosidade
5.
J Voice ; 2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-36966126

RESUMO

In this study, silicone vocal fold models with different geometries were manufactured using the common silicone brand EcoFlex 00-30 with typical oil mixing ratios. However, the proportions of oil typically used are higher than the manufacturer's recommended limit, in order to attain the softness of human vocal folds. This additional oil causes direct effects on the silicone, such as shrinkage, stickiness, evaporation, embrittlement, and uneven vulcanization. This study investigated the impact of these effects on the oscillation characteristics of the silicone vocal fold models and how they change over time. The goal was to examine the comparability of produced silicone vocal fold models and the results obtained from experiments performed with these models. For the manufactured models, the phonation onset pressure, offset pressure, mean volume velocity, pulmonary power, fundamental frequency, and measures of the glottal area waveform were collected over a period of up to 8 weeks. The results showed that the data for the models were highly scattered. Furthermore, over time, the phonation onset/offset pressures increased, leading to failure to oscillate for some models, and the glottal area waveform also changed. In conclusion, when working with over-thinned silicone vocal fold models, their characteristics depend strongly on the time of measurement. Therefore, it is recommended to carefully consider the effects of oil-oversaturation and timing of measurements when using silicone vocal fold models in experiments.

6.
PLoS One ; 18(2): e0281877, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36795744

RESUMO

In this study, 23 subjects produced cyclic transitions between rounded vowels and unrounded vowels as in /o-i-o-i-o-…/ at two specific speaking rates. Rounded vowels are typically produced with a lower larynx position than unrounded vowels. This contrast in vertical larynx position was further amplified by producing the unrounded vowels with a higher pitch than the rounded vowels. The vertical larynx movements of each subject were measured by means of object tracking in laryngeal ultrasound videos. The results indicate that larynx lowering was on average 26% faster than larynx raising, and that this velocity difference was more pronounced in woman than in men. Possible reasons for this are discussed with a focus on specific biomechanical properties. The results can help to interpret vertical larynx movements with regard to underlying neural control and aerodynamic conditions, and to improve movement models for articulatory speech synthesis.


Assuntos
Laringe , Fala , Masculino , Feminino , Humanos , Fonética , Laringe/diagnóstico por imagem , Movimento , Gravação de Videoteipe
7.
IEEE Trans Neural Netw Learn Syst ; 34(10): 7648-7659, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35120012

RESUMO

Echo state networks (ESNs) are a special type of recurrent neural networks (RNNs), in which the input and recurrent connections are traditionally generated randomly, and only the output weights are trained. Despite the recent success of ESNs in various tasks of audio, image, and radar recognition, we postulate that a purely random initialization is not the ideal way of initializing ESNs. The aim of this work is to propose an unsupervised initialization of the input connections using the K -means algorithm on the training data. We show that for a large variety of datasets, this initialization performs equivalently or superior than a randomly initialized ESN while needing significantly less reservoir neurons. Furthermore, we discuss that this approach provides the opportunity to estimate a suitable size of the reservoir based on prior knowledge about the data.

8.
Sci Rep ; 12(1): 4192, 2022 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-35273225

RESUMO

Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it. Radar is a relatively unexplored silent speech sensing modality, even though it has the advantage of being fully non-invasive. We therefore built a custom stepped frequency continuous wave radar hardware to measure the changes in the transmission spectra during speech between three antennas, located on both cheeks and the chin with a measurement update rate of 100 Hz. We then recorded a command word corpus of 40 phonetically balanced, two-syllable German words and the German digits zero to nine for two individual speakers and evaluated both the speaker-dependent multi-session and inter-session recognition accuracies on this 50-word corpus using a bidirectional long-short term memory network. We obtained recognition accuracies of 99.17% and 88.87% for the speaker-dependent multi-session and inter-session accuracy, respectively. These results show that the transmission spectra are very well suited to discriminate individual words from one another, even across different sessions, which is one of the key challenges for fully non-invasive silent speech interfaces.


Assuntos
Percepção da Fala , Fala , Idioma , Radar , Reconhecimento Psicológico
9.
J Acoust Soc Am ; 151(1): 45, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35105025

RESUMO

The periodic repetitions of laryngeal adduction and abduction gestures were uttered by 16 subjects. The movement of the cuneiform tubercles was tracked over time in the laryngoscopic recordings of these utterances. The adduction velocity and abduction velocity were determined objectively by means of a piecewise linear model fitted to the cuneiform tubercle trajectories. The abduction was found to be significantly faster than the adduction. This was interpreted in terms of the biomechanics and active control by the nervous system. The biomechanical properties could be responsible for a velocity of abduction that is up to 51% higher compared to the velocity of adduction. Additionally, the adduction velocity may be actively limited to prevent an overshoot of the intended adduction degree when the vocal folds are approximated to initiate phonation.


Assuntos
Gestos , Laringe , Humanos , Laringe/diagnóstico por imagem , Movimento , Fonação/fisiologia , Prega Vocal/fisiologia
10.
IEEE Trans Biomed Eng ; 69(1): 356-365, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34214033

RESUMO

OBJECTIVE: Stroke survivors commonly suffer from dysphagia, originating from oro-facial impairments which affect swallowing function. Functional therapy often employs tongue exercises that require the patient to perform short motion sequences. Evaluating the patient's performance on those exercises is difficult, because there is no reliable form of visual feedback. METHODS: We propose an optopalatographic device that does not require a personalized dental retainer and is capable of measuring tongue movement trajectories intraorally. The device features nine optical proximity sensors at 100 Hz and is fixated against the hard palate with a specifically developed palatal adhesive. The sensing capabilities of the device were evaluated on a tongue gesture corpus recorded from nine healthy individuals, containing eight different tongue exercises commonly used in functional dysphagia therapy. RESULTS: The measured tongue trajectories contained temporally and spatially resolved information about the tongue movement and location during each exercise. Furthermore, a simple DTW-kNN classifier was able to distinguish the exercises from one another with an average classification accuracy of 97.9 % and 61.4 % (cross-validation and inter-speaker test accuracy, respectively). CONCLUSION: the device can provide real-time feedback for tongue motion and we obtained promising gesture recognition results with relatively few sensors, even in the absence of a personalized dental retainer. SIGNIFICANCE: Non-personalized optopalatography is readily available and could aid in improving functional dysphagia therapy by providing visual feedback to both the physician and patient.


Assuntos
Transtornos de Deglutição , Deglutição , Transtornos de Deglutição/diagnóstico , Transtornos de Deglutição/etiologia , Transtornos de Deglutição/terapia , Humanos , Pressão , Estudos Prospectivos , Língua
11.
J Acoust Soc Am ; 150(2): 1209, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34470273

RESUMO

When pitch is explicitly modelled for parametric speech synthesis, microprosodic variations of the fundamental frequency f0 are usually disregarded by current intonation models. While there are numerous studies dealing with the nature and the origin of microprosody, little research has been done on its audibility and its effect on the naturalness of synthetic speech. In this work, the influence of obstruent-related microprosodic variations on the perceived naturalness of articulatory speech synthesis was studied. A small corpus of 20 German words and sentences was re-synthesized using the state-of-the-art articulatory synthesizer VocalTractLab. The pitch contours of the real utterances were extracted and fitted with the Target-Approximation-Model. After the real microprosodic variations were removed from the obtained pitch contours, synthetic variations were applied based on a microprosody model. Subsequently, multiple stimuli with different microprosody amplitudes were synthesized and evaluated in a listening experiment. The results indicate that microprosodic variations are barely audible, but can lead to a greater perceived naturalness of the synthesized speech in certain cases.


Assuntos
Percepção da Fala , Idioma , Fala , Acústica da Fala
12.
Sci Adv ; 7(34)2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34407948

RESUMO

Early detection of malign patterns in patients' biological signals can save millions of lives. Despite the steady improvement of artificial intelligence-based techniques, the practical clinical application of these methods is mostly constrained to an offline evaluation of the patients' data. Previous studies have identified organic electrochemical devices as ideal candidates for biosignal monitoring. However, their use for pattern recognition in real time was never demonstrated. Here, we produce and characterize brain-inspired networks composed of organic electrochemical transistors and use them for time-series predictions and classification tasks using the reservoir computing approach. To show their potential use for biofluid monitoring and biosignal analysis, we classify four classes of arrhythmic heartbeats with an accuracy of 88%. The results of this study introduce a previously unexplored paradigm for biocompatible computational platforms and may enable development of ultralow-power consumption hardware-based artificial neural networks capable of interacting with body fluids and biological tissues.

13.
J Acoust Soc Am ; 149(1): 466, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33514162

RESUMO

The influence of non-smooth trachea walls on phonation onset and offset pressures and the fundamental frequency of oscillation were experimentally investigated for three different synthetic vocal fold models. Three models of the trachea were compared: a cylindrical tube (smooth walls) and wavy-walled tubes with ripple depths of 1 and 2 mm. Threshold pressures for the onset and offset of phonation were measured at the lower and upper ends of each trachea tube. All measurements were performed both with and without a supraglottal resonator. While the fundamental frequency was not affected by non-smooth trachea walls, the phonation onset and offset pressures measured right below the glottis decreased with an increasing ripple depth of the trachea walls (up to 20% for 2 mm ripples). This effect was independent from the type of glottis model and the presence of a supraglottal resonator. The pressures at the lower end of the trachea and the average volume velocities showed a tendency to decrease with an increasing ripple depth of the trachea walls but to a much smaller extent. These results indicate that the subglottal geometry and the flow conditions in the trachea can substantially affect the oscillation of synthetic vocal folds.

14.
J Acoust Soc Am ; 150(6): 4191, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34972262

RESUMO

Resonance-strategies with respect to vocal registers, i.e., frequency-ranges of uniform, demarcated voice quality, for the highest part of the female voice are still not completely understood. The first and second vocal tract resonances usually determine vowels. If the fundamental frequency exceeds the vowel-shaping resonance frequencies of speech, vocal tract resonances are tuned to voice source partials. It has not yet been clarified if such tuning is applicable for the entire voice-range, particularly for the top pitches. We investigated professional sopranos who regularly sing pitches above C6 (1047 Hz). Dynamic three-dimensional (3D) magnetic resonance imaging was used to calculate resonances for pitches from C5 (523 Hz) to C7 (2093 Hz) with different vowel configurations ([a:], [i:], [u:]), and different contexts (scales or octave jumps). A spectral analysis and an acoustic analysis of 3D-printed vocal tract models were conducted. The results suggest that there is no exclusive register-defining resonance-strategy. The intersection of fundamental frequency and first vocal tract resonance was not found to necessarily indicate a register shift. The articulators and the vocal tract resonances were either kept without significant adjustments, or the fR1:fo-tuning, wherein the first vocal tract resonance enhances the fundamental frequency, was applied until F6 (1396 Hz). An fR2:fo-tuning was not observed.


Assuntos
Canto , Acústica , Feminino , Humanos , Imageamento por Ressonância Magnética , Fonação , Qualidade da Voz
15.
JASA Express Lett ; 1(7): 075203, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-36154640

RESUMO

This study compared the f0 of 14 German vowels in monosyllabic words (/dVt/) embedded in carrier sentences produced by 30 native speakers and 30 Mandarin Chinese learners. Appropriate techniques were employed to robustly measure f0 values and reliably analyze f0 profiles. The results showed that Mandarin learners produced the vowels bearing sentence stress with significantly larger f0 ranges and steeper f0 slopes but comparable f0 mean and maximum in comparison to German natives. Moreover, lax vowels produced by both groups demonstrated narrower ranges with faster f0 changes than tense vowels, which was stronger for Mandarin learners.


Assuntos
Idioma , China , Fatores de Tempo
16.
J Speech Lang Hear Res ; 63(12): 4252-4264, 2020 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-33170762

RESUMO

Purpose Psychoacoustical studies on transmission characteristics related to bone-conducted (BC) speech, perceived by speakers during vocalization, are important for further understanding the relationship between speech production and perception, especially auditory feedback. For exploring how the outer ear part contributes to BC speech transmission, this article aims to measure the transmission characteristics of bone conduction focusing on the vibration of the regio temporalis (RT) and sound radiation in the ear canal (EC) due to the excitation in the oral cavity (OC). Method While an excitation signal was presented through a loudspeaker located in the enclosed cavity below the hard palate, transmitted signals were measured on the RT and in the EC. The transfer functions of the RT vibration and EC sound pressure relative to OC sound pressure were determined from the measured signals using the sweep-sine method. Results Our findings obtained from the measurements of five participants are as follows: (a) the transfer function of the RT vibration relative to the OC sound pressure attenuated the frequency components above 1 kHz and (b) the transfer function of the EC relative to the OC sound pressure emphasized the frequency components between 2 and 3 kHz. Conclusions The vibration of the soft tissue or the skull bone has an effect of low-pass filtering, whereas the sound radiation in the EC has an effect of 2-3 kHz bandpass filtering. Considering the perceptual effect of low-pass filtering in BC speech, our findings suggest that the transmission to the outer ear may not be a dominant contributor to BC speech perception during vocalization.


Assuntos
Condução Óssea , Fala , Limiar Auditivo , Humanos , Boca , Crânio , Vibração
17.
Sci Data ; 7(1): 255, 2020 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-32759947

RESUMO

A detailed understanding of how the acoustic patterns of speech sounds are generated by the complex 3D shapes of the vocal tract is a major goal in speech research. The Dresden Vocal Tract Dataset (DVTD) presented here contains geometric and (aero)acoustic data of the vocal tract of 22 German speech sounds (16 vowels, 5 fricatives, 1 lateral), each from one male and one female speaker. The data include the 3D Magnetic Resonance Imaging data of the vocal tracts, the corresponding 3D-printable and finite-element models, and their simulated and measured acoustic and aerodynamic properties. The dataset was evaluated in terms of the plausibility and the similarity of the resonance frequencies determined by the acoustic simulations and measurements, and in terms of the human identification rate of the vowels and fricatives synthesized by the artificially excited 3D-printed vocal tract models. According to both the acoustic and perceptual metrics, most models are accurate representations of the intended speech sounds and can be readily used for research and education.


Assuntos
Acústica , Imageamento por Ressonância Magnética , Fonética , Impressão Tridimensional , Feminino , Análise de Elementos Finitos , Humanos , Idioma , Masculino
18.
J Acoust Soc Am ; 148(1): EL112, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32752753

RESUMO

This study analyzed the durational and spectral differences and their interaction in the production of seven German tense-lax vowel pairs between 30 German native speakers and 30 Mandarin learners of German. The results showed that Mandarin speakers differed significantly from the German speakers in producing the German tense-lax contrast. The general pattern was that Mandarin learners employed temporal features more strongly than spectral features to indicate the tense-lax contrast as compared to German speakers. The phonetic influences of the Mandarin language on the production of German tense and lax vowels are discussed.


Assuntos
Idioma , Percepção da Fala , Acústica , China , Humanos , Fonética , Acústica da Fala
19.
J Acoust Soc Am ; 146(1): 223, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31370636

RESUMO

The estimation of formant frequencies from acoustic speech signals is mostly based on Linear Predictive Coding (LPC) algorithms. Since LPC is based on the source-filter model of speech production, the formant frequencies obtained are often implicitly regarded as those for an infinite glottal impedance, i.e., a closed glottis. However, previous studies have indicated that LPC-based formant estimates of vowels generated with a realistically varying glottal area may substantially differ from the resonances of the vocal tract with a closed glottis. In the present study, the deviation between closed-glottis resonances and LPC-estimated formants during phonation with different peak glottal areas has been systematically examined both using physical vocal tract models excited with a self-oscillating rubber model of the vocal folds, and by computer simulations of interacting source and filter models. Ten vocal tract resonators representing different vowels have been analyzed. The results showed that F1 increased with the peak area of the time-varying glottis, while F2 and F3 were not systematically affected. The effect of the peak glottal area on F1 was strongest for close-mid to close vowels, and more moderate for mid to open vowels.

20.
PLoS One ; 13(3): e0193708, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29543829

RESUMO

Recently, 3D printing has been increasingly used to create physical models of the vocal tract with geometries obtained from magnetic resonance imaging. These printed models allow measuring the vocal tract transfer function, which is not reliably possible in vivo for the vocal tract of living humans. The transfer functions enable the detailed examination of the acoustic effects of specific articulatory strategies in speaking and singing, and the validation of acoustic plane-wave models for realistic vocal tract geometries in articulatory speech synthesis. To measure the acoustic transfer function of 3D-printed models, two techniques have been described: (1) excitation of the models with a broadband sound source at the glottis and measurement of the sound pressure radiated from the lips, and (2) excitation of the models with an external source in front of the lips and measurement of the sound pressure inside the models at the glottal end. The former method is more frequently used and more intuitive due to its similarity to speech production. However, the latter method avoids the intricate problem of constructing a suitable broadband glottal source and is therefore more effective. It has been shown to yield a transfer function similar, but not exactly equal to the volume velocity transfer function between the glottis and the lips, which is usually used to characterize vocal tract acoustics. Here, we revisit this method and show both, theoretically and experimentally, how it can be extended to yield the precise volume velocity transfer function of the vocal tract.


Assuntos
Modelos Biológicos , Prega Vocal/anatomia & histologia , Prega Vocal/fisiologia , Algoritmos , Análise de Elementos Finitos , Humanos , Impressão Tridimensional , Acústica da Fala , Medida da Produção da Fala
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA